\subsection{Related Work}
We now review several related works that use secure computation techniques. With respect to functionality the closest related work is that of Blanton and Aguiar\cite{ASIACCS:BlaAgu12} which describes a relatively complete set of protocols for performing intersections, unions, set difference, etc. and the corresponding SQL-like operations. Moreover, these operations are composable in that the inputs and outputs are secret shared between the parties. At the core of their technique is the use of a generic MPC protocol and an oblivious sorting algorithm that merges the two sets. This is followed by a linear pass over the sorted data where a relation is performed on adjacent items. Their technique has the advantage of being very general. However, the proposed algorithm has complexity $O(n \log^2 n)$, is not constant round,  and also requires unique join keys. 
\iffullversion
As a result, the implementation from 2011 performed poorly by current standards, intersecting $2^{10}$ items in 12 seconds. The modern protocol \cite{CCS:KKRT16} which is \emph{not composable} can perform intersections of $2^{20}$ items in 4 seconds. While this difference of three orders of magnitude would narrow if reimplemented using modern techniques, the gap would remain large.
\else
This results in poor concrete performance as shown in \sectionref{sec:eval}.
\fi


\iffullversion
% Private Set Intersection:Are Garbled Circuits Better than Custom Protocols? 
%     https://ssltest.cs.umd.edu/~jkatz/papers/psi.pdf
%\item  
Huang, Evans and Katz\cite{HEK12} also described a set intersection protocol based on sorting. Unlike \cite{ASIACCS:BlaAgu12}, this work considers the two party setting where each party holds a set in the clear.
This requirement prevents the protocol from being composable but allows the complexity to be reduced to $O(n\log n)$. The key idea is that each party locally sorts their set followed by merging the sets within MPC. The protocol can then perform a single pass over the sorted data to construct the intersection. While this results in performance improvements the overall protocol requires  $O(n\log n)$ operations and is not composable.
\fi

\iffullversion
% (KS06) Privacy-Preserving Set Operations
%     https://www.cs.cmu.edu/~leak/papers/set-tech-full.pdf
% (MF06) Efficient Polynomial Operations in the Shared-Coefficients Setting 
%     https://pdfs.semanticscholar.org/80ca/9f56cffce534e047d049884736ff16204958.pdf
%\item 
Another line of work was begun by Kissner and Song\cite{KS06} and improved on by \cite{MF06}. Their approach is based on the observation that set intersection and multi-set union have a correspondence to operations on polynomials. A set $S$ can be encoded as the polynomial $\hat S(x)= \prod_{s\in S}(x-s)\in \mathbb{F}[x]$. That is, the polynomial $\hat S(x)$ has a root at all $s\in S$. Given two such polynomials, $\hat S(x), \hat T(x)$, the polynomial encoding the intersection is $\hat S(x)+\hat T(x)$ with overwhelming probability given a sufficiently large field $\mathbb{F}$.

Multi-set union can similarly be performed by multiplying the two polynomials together. Unlike with normal union, if an item $y$ is contained in $S$ and $T$ then $\hat S(x)\hat T(x)$ will contain two roots at $y$ which is often not the desired functionality. This general idea can be transformed into a secure multi-party protocol using oblivious polynomial evaluation\cite{NP99} along with randomizing the result polynomial. The original computational overhead was $O(n^2)$ which can be reduced to the cost of polynomial interpolation $O(n\log n)$ using techniques from \cite{MF06}. The communication complexity is linear. In addition, this scheme assumes an ideal functionality to generate a shared Paillier key pair. We are unaware of any efficient protocol to realize this functionality except for \cite{RSA:HMRT12} in the two party setting.

This general approach is also composable. However, due to randomization that is performed the degree of the polynomial after each operation doubles. This limits the practical ability of the protocol to compose more than a few operations. Moreover, it is not clear how this protocol can be extended to support SQL-like queries where elements are key-value tuples. This general approach is also composable but incurs a 2 times overhead for each successive operations and cannot be extended to SQL-like queries. 
\fi


\iffullversion
% C. Hazay and K. Nissim. Efficient set operations in the presence of malicious adversaries. In PKC, 2010.
%   http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.454.1521&rep=rep1&type=pdf
%\item 
Hazay and Nissim\cite{HN12} introduce a pair of protocols computing set intersection and union which are also based on oblivious polynomial evaluation where the roots of the polynomial encode a set. However, these protocols are restricted to the two party case and are not composable. The non-composability comes from the fact that only one party constructs a polynomial $\hat S(x)$ encoding their set $S$ while the other party obliviously evaluates it on each element in their set. The result of these evaluations are compared with zero\footnote{The real protocol is slightly more complicated than this.}. These protocols have linear overhead and can achieve security in the malicious setting.
\fi

%\item 
Pinkas, Schneider and Zohner \cite{usenix:PSZ14} introduced a paradigm for set intersection that combines a hash table technique known as cuckoo hashing with a randomized encoding technique using oblivious transfer. Due to the hashing technique, the problem is reduced to comparing a single item $x$ to a small set $\{y_1,...,y_m\}$. Oblivious transfer is then used to interactively compute the randomized encoding $x'$ of $x$ while the other party locally computes the encodings $\{ y_1',..., y_m' \}$. A plaintext intersection can then be perform directly on these encodings. With the use of several optimization\cite{USENIX:PSSZ15,PSZ16,CCS:KKRT16,OOS17} this paradigm is extremely efficient and can perform a set intersection using $O(n)$ calls to a random oracle and $O(n)$ communication. These protocols are \emph{not composable} since the input must be known in the clear. Making them composable is non-trivial and they would likely introduce a large overhead.
\iffullversion
More recently this approach has also been extended to the malicious setting \cite{CCS:RinRos17} and separately to have sublinear communication when one set is much larger than the other\cite{CLR17} .
\fi

% https://eprint.iacr.org/2011/429.pdf
%\item Laur, Willemson and Zhang\cite{LWZ11} \todo{....}

% https://eprint.iacr.org/2013/203.pdf
%\item
Laur, Talvista and Willemson\cite{LTW13} present techniques in the honest majority setting for composable joins, unions and  many other operations at the expense of information leakage. Consider two parties each with a sets $X,Y$. The parties first generate secret shares of the sets and then use a generic MPC protocol to apply a pseudorandom function (PRF) $F$ to the shared sets to compute $X' = \{F_k(x) \mid x\in X\}, Y'=\{F_k(y) \mid y\in Y\}$ where the key $k$ is uniformly sampled by the MPC protocol (i.e. neither party knows $k$). $X'$ and $Y'$ are then revealed to both parties who use this information to infer the intersection, union and many other SQL-like operations. This basic approach dates back to the first PSI protocols \cite{Mea86,HFH99} where the (oblivious) PRF was implemented using a Diffie-Hellman protocol. \cite{LTW13} extended this paradigm to allow the input sets to be secret shared as opposed to being known in the clear.

The primary limitation of this approach is that all operations require all parties to know $X'$ and $Y'$. This prevents the protocol from being composable without significant information leakage. In particular, the cardinality of $X'\cap Y'$ and the result of the \texttt{where} clause for each row is revealed. This is of particular concern when several dataset are being combined. Learning the size of the intersection or the union can represent significant information. For instance, in the threat log application the union of many sets are taken. Each of these unions would reveal how many unique logs the new set has. Alternatively, taking the join between a set of hospital patients and a set of HIV positive patients would reveal how many have HIV. When combined with other information it could lead to the ability to identify some or all of these patients. Beyond this, the provided three party implementation achieved relatively poor performance. A join between two tables of a million records is estimated to require one hour on their three benchmark machines\cite{LTW13}. Looking forward, our protocol can perform a similar join operation in 4 seconds while preventing all leakage.

Bater, Elliott, et al.\cite{SMCQL} describe a outsourced MPC protocol where a client sends a SQL query to one of the computational parties who runs a garbled circuit based protocol amongst themselves. They present optimizations where some the computation is performed outside the MPC. We leave it as future work to explore the application of our new techniques in their setting.

%\end{itemize}